morally wrong
MONICA: Real-Time Monitoring and Calibration of Chain-of-Thought Sycophancy in Large Reasoning Models
Hu, Jingyu, Yang, Shu, Gong, Xilin, Wang, Hongming, Liu, Weiru, Wang, Di
Large Reasoning Models (LRMs) suffer from sycophantic behavior, where models tend to agree with users' incorrect beliefs and follow misinformation rather than maintain independent reasoning. This behavior undermines model reliability and poses societal risks. Mitigating LRM sycophancy requires monitoring how this sycophancy emerges during the reasoning trajectory; however, current methods mainly focus on judging based on final answers and correcting them, without understanding how sycophancy develops during reasoning processes. To address this limitation, we propose MONICA, a novel Monitor-guided Calibration framework that monitors and mitigates sycophancy during model inference at the level of reasoning steps, without requiring the model to finish generating its complete answer. MONICA integrates a sycophantic monitor that provides real-time monitoring of sycophantic drift scores during response generation with a calibrator that dynamically suppresses sycophantic behavior when scores exceed predefined thresholds. Extensive experiments across 12 datasets and 3 LRMs demonstrate that our method effectively reduces sycophantic behavior in both intermediate reasoning steps and final answers, yielding robust performance improvements.
MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models
Lin, Xiao, Liu, Zhining, Yang, Ze, Li, Gaotang, Qiu, Ruizhong, Wang, Shuke, Liu, Hui, Li, Haotian, Keswani, Sumit, Pardeshi, Vishwa, Zhao, Huijun, Fan, Wei, Tong, Hanghang
Warning: This paper contains examples of harmful language and images. Reader discretion is advised. Recently, vision-language models have demonstrated increasing influence in morally sensitive domains such as autonomous driving and medical analysis, owing to their powerful multimodal reasoning capabilities. As these models are deployed in high-stakes real-world applications, it is of paramount importance to ensure that their outputs align with human moral values and remain within moral boundaries. However, existing work on moral alignment either focuses solely on textual modalities or relies heavily on AI-generated images, leading to distributional biases and reduced realism. To overcome these limitations, we introduce MORALISE, a comprehensive benchmark for evaluating the moral alignment of vision-language models (VLMs) using diverse, expert-verified real-world data. We begin by proposing a comprehensive taxonomy of 13 moral topics grounded in Turiel's Domain Theory, spanning the personal, interpersonal, and societal moral domains encountered in everyday life. Built on this framework, we manually curate 2,481 high-quality image-text pairs, each annotated with two fine-grained labels: (1) topic annotation, identifying the violated moral topic(s), and (2) modality annotation, indicating whether the violation arises from the image or the text. For evaluation, we encompass two tasks, \textit{moral judgment} and \textit{moral norm attribution}, to assess models' awareness of moral violations and their reasoning ability on morally salient content. Extensive experiments on 19 popular open- and closed-source VLMs show that MORALISE poses a significant challenge, revealing persistent moral limitations in current state-of-the-art models. The full benchmark is publicly available at https://huggingface.co/datasets/Ze1025/MORALISE.
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Sel, Bilgehan, Shanmugasundaram, Priya, Kachuee, Mohammad, Zhou, Kun, Jia, Ruoxi, Jin, Ming
Large Language Models (LLMs) have shown remarkable capabilities in tasks such as summarization, arithmetic reasoning, and question answering. However, they encounter significant challenges in the domain of moral reasoning and ethical decision-making, especially in complex scenarios with multiple stakeholders. This paper introduces the Skin-in-the-Game (SKIG) framework, aimed at enhancing moral reasoning in LLMs by exploring decisions' consequences from multiple stakeholder perspectives. Central to SKIG's mechanism is simulating accountability for actions, which, alongside empathy exercises and risk assessment, is pivotal to its effectiveness. We validate SKIG's performance across various moral reasoning benchmarks with proprietary and opensource LLMs, and investigate its crucial components through extensive ablation analyses.
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving
Chen, Pei, Han, Boran, Zhang, Shuai
Large Language Models (LLMs) have shown great ability in solving traditional natural language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. In particular, we discover that applying different reasoning paths for different roles is an effective strategy to implement few-shot prompting approaches in the multi-agent scenarios. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems over competitive baselines. Our further analysis shows the necessity of prompting LLMs to play different roles or experts independently. We release the code at: https://github.com/amazon-science/comm-prompt
Let's Do a Thought Experiment: Using Counterfactuals to Improve Moral Reasoning
Ma, Xiao, Mishra, Swaroop, Beirami, Ahmad, Beutel, Alex, Chen, Jilin
Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn't work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.
Artificial Intelligence: our coming sideways move
We are about 25 years from an AI asking us why we think we have the right to own them. What are we going to say to them? We've had plenty of time to prepare. Science fiction writers have been considering the idea since Isaac Asimov wrote the Bicentennial Man, and probably well before. Star Trek has explored it head-on on at least two occasions, and the entire character arcs of both Data and The Doctor revolve around this question.
Will the A.I. be Vegan?
To frame this conversation, I am not a vegan. Like most people, I just eat meat because I like it. It falls into a category of social movements where one side of the debate are morally motivated, passionate warriors for a cause attempting to change society, and the other side just doesn't have an interest in thinking about the topic. Usually this happens when one side of the "debate" doesn't have any acute pain associated with the issue. This is why these types of movements, almost necessarily, have to use tactics that areโฆannoying.
Humans of the near future
A new breed of human is on its way. Transhumanists are a group who seek to accelerate the evolution of humanity through science and technology. Oliver Pickup investigates the movement, the implications for humankind and asks, is it morally wrong to augment humans? The world's preeminent'cyborg artist', Neil Harbisson (pictured above), has been stopped "several times a day, every single day, since March 22, 2004". It's impossible for him to forget the date: that Monday, 13 years ago, he had an antenna fixed to his skull in order to'hear' colour.
Moral Decision Making Frameworks for Artificial Intelligence
Conitzer, Vincent (Duke University) | Sinnott-Armstrong, Walter (Duke University) | Borg, Jana Schaich (Duke University) | Deng, Yuan (Duke University) | Kramer, Max (Duke University)
The generality of decision and game theory has enabled domain-independent progress in AI research. For example, a better algorithm for finding good policies in (PO)MDPs can be instantly used in a variety of applications. But such a general theory is lacking when it comes to moral decision making. For AI applications with a moral component, are we then forced to build systems based on many ad-hoc rules? In this paper we discuss possible ways to avoid this conclusion.
Moral Decision Making Frameworks for Artificial Intelligence
Conitzer, Vincent (Duke University) | Sinnott-Armstrong, Walter (Duke University) | Borg, Jana Schaich (Duke University) | Deng, Yuan (Duke University) | Kramer, Max (Duke University)
The generality of decision and game theory has enabled domain-independent progress in AI research. For example, a better algorithm for finding good policies in (PO)MDPs can be instantly used in a variety of applications. But such a general theory is lacking when it comes to moral decision making. For AI applications with a moral component, are we then forced to build systems based on many ad-hoc rules? In this paper we discuss possible ways to avoid this conclusion.